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1.  Introduction. 


The  SOR  method  (successive  overrelaxation)  is  a  standard  iterative  method  for  solving  linear 
systems  of  equations,  particularly  large  sparse  systems  arising  from  partial  differential  equations. 
Convergence  of  the  method  is  greatly  affected  by  the  choice  of  overrelaxation  parameter  to.  A  stan¬ 
dard  model  problem  for  analyzing  SOR  is  the  system  of  equations  arising  from  a  finite  difference 
discretization  of  Laplace’s  equation  on  a  rectangle  with  zero  boundary  data.  The  solution  of  this  prob¬ 
lem  is  identically  zero;  hence,  the  iterates  of  SOR  also  represent  the  error  at  each  step.  The  conver¬ 
gence  properties  of  SOR  for  the  model  problem  also  apply  to  Poisson’s  equation  with  general  Dirichlet 
boundary  data,  since  the  errors  will  still  satisfy  the  homogeneous  equation. 

Using  the  five-point  approximation  to  the  Laplacian  gives 

“y-ijk  +  My+ijt  +  +  K/jt+i  -4uyt  =0,  j,k  =  1,2,  •  •  •  ,N—1, 

Ujt  =  0,  j  =  0,N  ork=0,N,  (U 

where  u Jk  approximates  the  solution  u  ( x ,  ,vO  with  Xj  =  jh ,  y*  =  kh ,  and  h  =  UN .  This  gives  a  linear 
system  of  (N-l)2  equations  in  (A/ -l)2  unknowns.  The  exact  form  of  the  matrix  equation,  and  the  form 
that  the  SOR  iteration  takes,  depends  on  the  order  in  which  the  unknowns  ujk  are  arranged  in  the  vec¬ 
tor  of  unknowns.  Two  standard  orderings  are  the  natural  rowwise  (NR)  ordering  and  the  Red-Black 
(RB)  ordering,  in  which  the  grid  is  colored  in  a  checkerboard  fashion  and  the  Red  points  are  ordered 
before  the  Black  points  (by  rows  within  each  color).  For  the  model  problem,  the  optimal  to  and  rate  of 
convergence  are  the  same  for  both  of  these  orderings.  This  mode’  i"' b  em  was  analyzed  by 
Frankel[1950]  and  also  by  Young[1950]  who  gave  a  more  general  theory  >R  for  a  wide  class  of 
matrix  equations  in  which  the  matrix  is  "consistently  ordered". 

Another  standard  model  problem  is  obtained  by  using  the  nine-point  approximation  to  the  Lapla¬ 
cian: 

4“/-U  +  4“/+U  +  +  4k;jh.,  +  Uj-it- 1  + 

+  “y+ljk-l  +  “/+1J+1  ~  =  0,  j,k  =  1,2, 

“/*  =  0.  j  =0,N  or  it  =  0,  N 


(1.2) 
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Again  there  are  various  ways  to  order  the  unknowns,  but  none  of  these  leads  to  a  consistently  ordered 
matrix  and  so  the  theory  of  Young  does  not  apply.  Multicolor  orderings,  similar  to  the  RB  ordering 
mentioned  above  but  usually  involving  four  colors  for  the  nine-point  stencil,  are  of  particular  interest 
for  parallel  processing  applications.  Recendy  Adams  and  Jordan[1986]  have  studied  this  problem  in  a 
more  general  context  and  identified  72  distinct  four  color  orderings.  These  can  be  grouped  into 
equivalence  classes  that  are  known  to  have  the  same  convergence  behavior.  For  the  model  problem 
considered  here,  their  theory  reduces  these  to  three  classes  of  orderings  that  could  potentially  have  dif¬ 
ferent  convergence  rates,  although  the  actual  rate  was  not  determined  for  any  class. 

In  this  paper,  we  show  that  in  fact  two  of  these  classes  are  equivalent  and  also  have  the  same 
convergence  behavior  as  the  natural  rowwise  ordering.  The  third  class  is  shown  to  be  distinct,  with  a 
different  optimal  to  and  convergence  rate.  In  each  case  the  eigenvectors  of  the  iteration  matrix  are 
determined.  The  eigenvalues  (which  determine  the  convergence  rate)  are  found  in  terms  of  the  roots 
of  quartic  equations.  The  optimal  w  for  small  h  is  given  by  an  asymptotic  expansion  about  h  =  0,  and 
is  based  on  an  unproved  assumption  about  which  frequencies  dominate  the  error  decay.  However,  this 
approximation  agrees  very  well  with  values  obtained  numerically  for  small  h . 

In  Section  2  we  use  a  separation  of  variables  technique  to  determine  the  eigenvalues  and  eigen¬ 
vectors  for  the  NR  ordering.  The  resulting  quartic  equation  is  used  to  derive  the  expansion  for  the 
optimal  to.  Our  results  for  this  ordering  differ  from  those  given  by  Garabedian[1956],  We  explain 
why  both  results  are,  in  a  sense,  correct. 

In  Section  3  we  discuss  the  various  multicolor  orderings.  The  main  technique  we  use  is  a 
change  of  variables  from  n ,  the  iteration  number,  to  v,  the  "data  flow  time",  as  defined  by  Adams  and 
Jordan(1986],  The  fact  that  this  change  of  variables  can  be  used  to  simplify  and  relate  various  SOR 
methods  was  observed  by  LeVeque  and  Trefelhcn[1986].  They  present  a  simple  Fourier  analysis  of 
SO R  on  the  five-point  stencil  and  show  the  equivalence  of  NR  and  RB  using  a  change  of  variables 
motivated  by  Garabedian[1956]  that  is  equivalent  to  the  data  flow  times. 
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In  Section  4  we  analyze  a  pseudo-SOR  method  based  on  a  Red-Black  coloring  for  the  nine-point 
stencil.  Although  this  method  has  attractive  features  for  parallel  computers,  we  show  that  it  is  unsatis¬ 
factory,  being  an  order  of  magnitude  slower  than  the  true  SOR  methods  with  optimal  co. 

Finally,  in  Section  5,  we  briefly  discuss  line  SOR  methods  and  compare  their  convergence  rates 
with  the  point  SOR  methods  discussed  in  this  paper. 

2.  Analysis  of  the  natural  rowwise  ordering. 

For  the  nine  point  stencil  with  NR  ordering,  the  SOR  method  takes  the  form 

k,V  =  (1-co)  «ji  +  f  (k;j? It  +u;-i'*+u;+iJC+u;M1 )  (2.1) 

+  ^  (uj‘-U'+l  +uJ-l.k-l  +“j+U+!  )- 

We  assume  that  this  iteration  has  a  solution  of  the  form 


=\"w(Xj,yk). 


(2.2) 


Then  X  is  an  eigenvalue  of  the  iteration  matrix  and  the  vector  with  components 
w(XjJk)’  jJc-l,  ■  ■  ■  ,N- 1  gives  the  corresponding  eigenvector.  Plugging  (2.2)  into  (2.1),  cancel¬ 
ling  a  common  factor  of  X",  and  dropping  the  subscripts  on  x  and  y  gives 


Xw(x,y)  -  Xco 


j  w  (x-h  ,y  )+-^-w  (x-h  ,y-h  )+jw(x  ,y-h  )+~w(x+h  ,y-h ) 


j * (x+h  ,y  )+-^-w  (x-h  ,y  +h)+~w  (x  ,y  +A)+-^-w (x  +h  ,y+h ) 


+  © 

+  (1- co)  w(x,y). 


(2.3) 


The  eigenfunction  w  (x  ,y )  must  be  zero  on  the  boundary  of  the  unit  square  in  view  of  the  given  boun¬ 
dary  conditions. 

We  use  separation  of  variables  and  let  w  (x  ,y )  =  X(x)Y (y ).  We  also  set  &  =  X1/2.  Plugging  this 
into  (2.3)  and  dividing  by  X(x)Y(y )  gives 
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a2+a>-l  1 

a2Y(y-h)+Y(y+h) 

1 

a2Y(y-h)+Y(.y+h) 

X(x-h)+X(x+h) ' 

co  5 

y  (y) 

+  20 

Y(y) 

X(x) 

1 

-4 - 

a2X(x-h)+X(x+h) 

I 

5 

X(x) 

r 

(2.4) 


Now  let 


Y(y)  =  aylhsin  td> 


(2.5) 


to  get 


a2+co-l  2a  ,  X(x~h)  ^  X(x+h)  A 


(O 


(2.6) 


where 


.  1  2  a  , 

<Di  =  ya2+— cosn* 


We  note  that  <!>,  and  d>2  are  independent  of  x  and  y  and  let 

X(x)  =  (<ty4>2),/2*sin  fyc . 


(2.7) 


(2.8) 


Then 


<r>,X(x-/i)  +  <t>2XCc+/0 

X(x) 


=  2<&11/24>21/2cos  %h 


and  using  this  in  (2.6)  gives 


a  1  -  ~costi/i  =  2d>i/2<t>2/2  cos^h. 
a)  5 


(2.9) 


Squaring  (2.9),  using  (2.7),  and  rearranging  terms  gives  a  quarlic  equation  for  a: 


4  2 

a4  -  [-2-(ocosr)/j  +  —  a>2cos2^/icosr(/t]  a3 

-  [  2(1-  to)  — ^•u)2cos2ii/j  +  -L-co2cos2^/i(cos2ri/t+4)  ]  a2 

LtD 

4  2 

+  [—<*>(1—  (o)costiA  -  —  co2cos2^cost|/i]  a+  (1-  <o)2  =  0 

D 


(2.10) 
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An  eigenmode  of  the  iteration  matrix  thus  has  the  form 

“A  =  X\Y  (xj)Y  (yk)  (2n) 

_  a2 «+*  sin^r,  sinqy*. 

In  order  for  the  boundary  conditions  to  be  satisfied,  £  and  q  must  be  integer  multiples  of  rt.  The  eigen¬ 
value  is  X  =  a2,  where  a  is  a  root  of  (2.10).  Note  that  we  must  choose  the  correct  square  root  of  dyd>2 
in  (2.11)  (recall  that  we  squared  (2.9)  to  obtain  (2.10),  introducing  additional  solutions).  The  correct 
sign  for  <$>ln<&22  is  determined  by  the  requirement  that  (2.9)  be  satisfied,  and  this  gives  the  correct 
sign  for  ^  well. 

The  frequencies  £,  and  t|  each  range  over  the  values  7t,  2rt,  •  •  • ,  (N-l) y.  This  gives  a  set  of 

(jV-1)2/4  pairs  of  frequencies.  Corresponding  to  each  pair  (^,q)  there  are  four  roots  of  the  quartic 
(2.10).  In  general  these  roots  are  distinct,  and  so  we  obtain  (N- 1)2  eigenvalues  and  eigenvectors,  the 
correct  number.  When  the  roots  are  not  distinct,  principal  vectors  will  be  obtained  and  the  number  of 
eigenvectors  will  be  less  than  (N- 1)2.  Recall  that  for  the  5-point  discretization,  a  principal  vector  was 
associated  with  the  optimal  to,  (see  e.g.  Young(1971]).  We  make  no  attempt  here  to  determine  the 
principal  vectors  or  the  values  of  to  for  which  they  occur.  However,  by  continuity,  we  do  obtain  all  of 
the  eigenvalues. 

yv-i 

The  frequencies  (— — +1)«,  •  •  • ,  which  one  might  expect  to  be  included  as  well,  give 

repeats  of  the  eigenvectors  already  found.  Replacing  £  by  Nn-^  leaves  (2.10)  unchanged  while 
replacing  q  by  N n-q  simply  negates  the  coefficients  of  a  and  a3.  In  either  case  the  squares  of  the 
roots  are  unchanged.  The  eigenmodes  (2.1 1)  are  also  unchanged  by  these  frequency  reflections. 

The  convergence  rate  of  the  method  (for  fixed  ca)  is  determined  by  the  spectral  radius  of  the 
iteration  matrix,  which  is 

p  =  max  IX(^,q)l. 

To  determine  the  optimal  to,  we  need  to  minimize  p  over  to. 
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To  help  characterize  the  roots  of  (2.10),  we  first  solved  the  quartic  numerically  for  various 
values  of  the  parameters.  For  example,  the  solid  lines  in  Figure  2.1  (the  x’s  will  be  explained  later) 
show  the  magnitude  of  the  four  roots  plotted  as  a  function  of  co  when  4=q=7i  for  h=l/10,  1/100,  and 
1/1000.  When  co  is  small  there  are  four  real  roots.  As  co  increases,  two  of  the  roots  become  complex 
conjugates.  The  optimal  co  apparently  occurs  when  these  complex  roots  intersect  the  largest  real  root. 
As  omega  increases  further,  two  more  roots  become  complex  conjugates  and  near  co=2  there  are  two 
complex  conjugate  pairs.  The  same  behavior  was  observed  for  smaller  values  of  h  using  various 
values  of  J;  andt). 

In  all  our  experiments  with  various  values  of  h ,  £=^=71  produced  the  largest  root  at  each  to  and 
hence  the  optimal  co  can  presumably  be  determined  by  minimizing  the  roots  for  this  particular  combi¬ 
nation  of  frequencies.  This  is  consistent  with  what  is  known  to  be  true  for  the  five-point  model  prob¬ 
lem,  but  we  have  not  been  able  to  prove  that  this  is  correct  for  the  nine-point  stencil. 


These  observations  suggest  a  strategy  for  determining  the  optimal  omega  for  small  h  when 
We  first  find  the  roots  of  (2. 10)  when  h  =  0  and  then  expand  about  these  roots  and  equate  the 
modulus  of  the  roots  to  get  the  optimal  co.  As  h  — » 0,  the  optimal  co  approaches  2.  Setting  h  =  0  and 
co  =  2  in  (2.10)  gives 


The  roots  of  (2. 12)  are 


4  48  3  46  2  48  .  n 

a  -  — cr  +  —  a  -  —  a+  1  =  0. 
25  25  25 


(2.12) 


where  9=cos  ‘(-1/25). 


0^=1,  a2=l,  a3=e'9,  ct4=e'lS 


(2-13) 


First,  we  expand  the  real  root  at  co0(*  about  0^=1.  We  set 

a=l  -  C\h  —  c  2*  2  , 
co  =  2-Aj/i  - k2h 2 


(2.14a) 

(2.14b) 


and  substitute  into  (2.10).  After  equating  coefficients  of  h  2  we  get 
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13c  i2  -  1 5X: !  c !  +  1 8rc2  =  0 ,  (2.15) 

which  determines  c  i  as  a  function  of  k  t. 

Next  we  expand  about  c10,  setting 

a  =  e,0(l-ph)  (2.16) 

and  again  take  to  of  the  form  (2.14b).  Substituting  into  (2.10)  and  equating  coefficients  of  h  gives 


4i8  144  -3<9i  92  -2i8  ;8\_t  /  28  -3ie  46  -2i9 


P(4e--^e*8+_e 


8_  ZZ.e-9)  =  lc  - _e. 

25  ;  ll25  25 


V  — ei0-2 ) 

25 


which  determines  p  as  a  function  of  k  t: 


(2.17) 


P  =  .423077  ! .  (2.18) 

Recall  that  X  =  a2  is  the  eigenvalue  we  seek,  so  we  equate  I  ot2l  from  (2.14a)  and  la2l  from  (2.16)  to 
highest  order  to  get 


1  -  2c  ih  +  0  (A2)  =  1  -  2p/i  +  0  (A2), 

and  so  c  i=p.  Plugging  this  into  (2.15)  and  using  (2.18)  allows  us  to  solve  for  k  j, 

*,  =2.11624*. 


Consequently, 


(2.19) 


C!  =  .89533rt.  (2.20) 

Using  these  values  in  (2.14a)  and  (2.14b),  we  find  that  the  optimal  co  and  the  corresponding  spectral 
radii  have  approximate  values 


0)opl  =  2  -  2.1 16247th  , 

Popr  =  <,  =  1  -  2c  xh  =  1  -  1.79066ttA 

for  small  A. 

For  comparison,  the  corresponding  values  for  the  five-point  model  problem  are 


(2.21) 


co*?*  =  2-2nh  , 
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P$?(  =  1  -  2nh. 

Notice  that  for  the  nine-point  stencil,  the  spectral  radius  is  slightly  larger,  giving  somewhat  slower 
convergence  than  for  the  five-point  stencil,  although  the  two  are  very  close.  Most  importantly,  both 
arc  1  -  0{.h)  as  A— > 0,  giving  asymptotically  the  same  order  of  convergence.  By  contrast  the  Jacobi 
and  Gauss-Seidel  methods,  and  also  the  pseudo-SOR  method  analyzed  in  Section  4,  have  spectral 
radius  \  -Oih2)  as  h  — >  0. 

It  is  very  interesting  to  compare  these  results  with  asymptotic  results  obtained  by  Gara- 
bcdian[1956],  especially  since  they  do  not  agree  and  yet  both  are,  in  a  sense,  correct  Garabedian’s 
analysis  is  based  on  viewing  the  SOR  iteration  (2.1)  as  a  finite  difference  method  for  a  time-dependent 
PDE.  Expanding  in  Taylor  series  shows  that  this  difference  equation  is  consistent  with  the  PDE 

5CUf  +  lUj,  +  3  My,  =  3  Ua  +  3  Uyy  Q  22^ 

u  =  0  on  the  boundary 


where  C  and  co  are  related  by 


(2.23) 


If  we  fix  C  >0  and  choose  to  according  to  (2.23)  for  each  h>0  then  0<cik2  and  so  the  method  (2.1)  is 
stable.  Since  it  is  consistent  with  the  linear  equation  (2.22),  iterates  mJ*  with  n=T lh  must  converge  to 
solutions  u(xjjk,T )  of  the  PDE  (2.22)  as  h->0  (by  the  Lax  Equivalence  Theorem)  if  we  choose  ujt 
by  discretizing  fixed  initial  data  u(x,y, 0).  Consequently,  studying  the  decay  of  solutions  to  (2.22) 
gives  information  about  the  rate  of  convergence  of  SOR. 

By  introducing  the  change  of  variables 


1 

(2.22)  is  transformed  to 


s  - 1 


L 

2 


5  Cu,  + 


=  3«„  +  3  Uyy 


(2.24) 


(2.25) 
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Separation  of  variables  shows  that  e'^enmodes  of  this  PDE  have  the  form 

u(x,y  ,s)  =  e~ps  sin  &sinr\y  (2.26) 

where  %  and  q  are  integer  multiples  of  n  and  p-p  (£,t|)  is  a  root  of  the  quadratic  equation 

-||p2-5Cp+3(42+Ti2)  =  0.  (2.27) 

Transforming  back  to  the  original  time  variable  gives 

u(x,y,t)  =  e-p^xl3*y,2)sm %x sin  qy  .  (2.28) 

Note  that  in  a  time  step  of  length  h,  this  solution  decays  by  a  factor  e'Re(pi*.  The  eigenmode  with 
slowest  decay  is  obtained  by  taking  £=q=7i  and  the  negative  square  root  in  solving  (2.27)  giving 

P  nun  =  ~  (sc  -  '25C2-26ti2  ]  (2.29) 

If  we  obtain  initial  data  for  SOR  by  discretizing  the  corresponding  eigenfunction, 
ufk=u(Xj,yk, 0)  from  (2.28),  it  follows  (by  convergence)  that  the  decay  factor  for  the  SOR  iteration 
must  have  the  form 

^,=  l-*e(Pmin)*+0(A2)  (2.30) 

as  h  -»0.  Since  taking  other  eigenfunctions  as  initial  data  gives  faster  decay,  one  is  led  to  the  conclu¬ 
sion  that  in  order  to  obtain  the  fastest  possible  convergence,  we  should  maximize  Reip^)  and  hence 
minimize  X*,,,.  Recall  that  the  value  of  C  is  still  at  our  disposal.  We  can  minimize  Re(pm ;„)  by  set¬ 
ting  the  radical  to  zero  in  (2.29),  giving 

C  =  — p-n  =  1.02rt,  P  mm  =  67W2/U  =  2.35jt .  (2.31) 

By  (2.23)  and  (2.30)  we  obtain  the  following  predictions  for  the  optimal  to  and  the  corresponding 
decay  rate,  as  in  Garabedian[1956]: 

<*  =  =  2(1-C7t)  =  2  -  2.04 7t/i 

p^  =  rp^^l-Pminh  =  l-235Kh. 


(2.32) 
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These  values  do  not  agree  with  the  values  (2.21)  found  by  computing  the  eigenvalues  of  the 
iteration  matrix.  The  reason  is  the  following.  While  (2.30)  does  indeed  give  a  correct  expression  for 
the  largest  eigenvalue  of  the  iteration  matrix  corresponding  to  a  decay  factor  for  the  PDE,  there  are 
other,  spurious,  eigenvalues  of  the  discrete  problem  that  have  larger  magnitude  for  to  near  2  and  hence 
determine  the  spectral  radius  p.  This  is  seen  clearly  in  Figure  2.1  where  we  have  plotted  \e~ph  I  for 
the  two  roots  of  the  quadratic  (2.27)  (as  +’s)  along  with  IXI ,  the  magnitudes  of  the  actual  eigenvalues 
obtained  by  solving  the  quartic  (2.10)  (as  the  solid  lines).  One  pair  of  discrete  eigenvalues  closely 
matches  \e~ph  1  for  small  h  while  the  other  (complex  conjugate)  pair  does  not.  In  fact,  Garabedian’s 
results  may  also  be  obtained  from  our  approach  by  choosing  k  i  in  (2.15)  to  maximize  C\,  thereby 
ignoring  the  effect  of  the  other  root. 

For  each  Gxed  h ,  we  can  choose  initial  data  (namely,  as  a  spurious  eigenvector)  so  that  conver¬ 
gence  is  slow  and  determined  by  the  spectral  radius.  On  the  other  hand,  these  spurious  eigenvectors 
become  highly  oscillatory  and  are  not  convergent  as  /t— >0.  Consequently,  if  we  obtain  our  initial  data 
by  discretizing  a  Gxed  funcuon  of  x  and  y  at  each  h  (as  is  more  realistic  in  practice),  we  would  expect 
to  see  vanishingly  small  components  of  these  spurious  eigenvectors  as  h-tO.  For  pracdcal  purposes, 
then,  the  values  (2.32)  obtained  by  Garabedian  may  be  more  meaningful  and  useful  than  the  "true" 
values  (2.21). 

This  is  demonstrated  in  Figure  2.2  where  we  show  the  decay  of  1 1  u  1 1 2  for  various  initial  data. 
For  initial  data  obtained  by  discretizing  the  smooth  data  u(x  ,y)=(x2-x)(y2-y),  the  observed  decay  is 
much  closer  to  X^„,  as  predicted  by  (2.30),  than  to  p". 

To  verify  that  the  eigenvector  corresponding  to  the  spectral  radius  is  highly  oscillatory,  we  note 
that  by  (2.5)  and  (2.8)  an  eigenve/"  -  has  the  iTrm 

=at[(<tl/02)I/2]/sin^sinqyt.  (2.33) 

At  die  point  0)^,,  the  maximum  eige.  '".iue  corresponds  to  a  value  of  a  given  by  (2.16).  Inserting  this 
in  (2.33)  gives 
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wjk  =  e'et(l  -  pA)t[(<V<X>2)1,Vsin^sinT^ .  (2.34) 

Since  0  =  cos-1(-l/25),  this  function  will  be  oscillatory  in  k  and  nonconvergent  as  h  — »  0. 

By  contrast,  the  eigenvector  corresponding  to  the  other  pair  of  roots  converges  to  the  eigenfunc¬ 
tions  (2.28)  of  the  PDE  as  h  -*  0.  For  these  vectors  our  previous  arguments  show  that  X  has  the  form 
of  (2.30)  and  hence  a,  the  square  root  of  X,  can  be  expressed  as 

a=  1-  ~ph  +0(h2).  (2.35) 

Expanding  using  this  value  of  a  in  (2.7)  shows  that 

<ty<t>2  =  1  -  ph+0(h2).  (2.36) 

Plugging  (2.35)  and  (2.36)  into  (2.1 1)  gives 

uji  =(1  -  i ph  +0(h2))2A+k(l  -  y ph  +0{h*))>s\ntjh  sin r\kh  (2.37) 

=  (1  -  ph  +  O (h2))l'*'n*ins\ntjh  %\nr\kh  . 

As  h  0  with  t  =  nh ,  x  =  jh  and  y  =  kh  fixed,  this  approaches  the  eigenmode  (2.28)  of  the  PDE. 
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Figure  2.1.  Solid  lines  show  jaz|  =  |\|  for  the  iteration  matrix  with 
f  =  r)  =  tt,  obtained  by  solving  the  quartic  (2.10).  The  symbols  +  show  e~ph 
where  p  is  a  root  of  the  quadratic  equation  (2.27)  with  $  =  tj  =  tt. 

(a)  h  =  ,  1  for  0  £  u  £  2. 

(b)  h  =  .01  for  0  £  u  £  2. 

(c) h  =  .01  for  1.9  £  u  £  2. 

(d)  k  =  .001  for  1.99  £  u  £  2. 


Figure  2.2.  Convergence  history  (2-norm  of  error  versus  iteration  number)  for  the 
NR  ordering  with  h  =  0.05  and  gj  =  1.86.  Three  choices  of  initial  data  are  compared: 

(a)  An  eigenvector  corresponding  to  the  spectral  radius. 

(b)  An  eigenvector  corresponding  to  the  largest  nonspurious  eigenvalue. 

(<=)  1$  =  (*/  -  Xj)(y?  -  yk). 

Note  that  the  realistic  initial  data  (c)  more  closely  matches  (b)  them  (a). 


3.  Multicolor  SOR. 


In  this  section  we  consider  the  SOR  method  applied  to  the  nine-point  model  problem  with 
several  alternative  orderings  of  the  unknowns  ujk.  These  orderings  are  determined  by  labeling  the  grid 
points  with  four  different  colors  (Red,  Black,  Green  and  Orange)  and  then  ordering  the  points  by  first 
listing  all  the  points  of  one  color,  then  a  second  color,  and  so  on.  The  overall  ordering  of  grid  points  is 
determined  by  two  factors:  a)  the  manner  in  which  the  grid  points  are  labeled  (the  coloring  of  the  grid) 
and  b)  the  order  in  which  the  colors  are  taken  (the  ordering  of  the  colors). 

Four-colorings  are  of  interest  for  the  nine-point  stencil  because  with  four  colors  it  is  possible  to 
decouple  the  grid,  in  the  sense  that  the  resulting  SOR  formula  for  updating  a  grid  point  of  any  given 
color  involves  neighboring  grid  points,  all  of  which  have  different  colors  than  the  center  point.  This  is 
advantageous  in  parallel  processing  applications  since  all  grid  points  of  the  same  color  can  be  updated 
simultaneously. 

For  the  five-point  stencil,  two  colors  suffice  to  decouple  the  grid  using  the  RB  checkerboard  pat¬ 
tern  discussed  in  Section  1.  Recently  LeVeque  and  Trefelhen[1986]  have  shown  an  easy  way  to 
analyze  the  five-point  model  problem  using  a  change  of  variables  from  the  iteration  number,  n ,  to  the 
earliest  time,  v,  that  the  unknown  at  a  grid  point  can  be  updated  assuming  one  update  requires  one  time 
unit.  This  variable  v  corresponds  to  the  "data  flow  times"  discussed  in  Adams  and  Jordan[1986]  and 
closely  resembles  the  change  of  variables  used  by  Garabcdian[1956]  to  analyze  the  PDE.  This  change 
of  variables  allows  the  use  of  Fourier  analysis  to  determine  the  convergence  rate  and  optimal  co.  It 
also  gives  a  straightforward  proof  of  the  equivalence  of  the  NR  and  RB  orderings. 

Here  we  use  this  same  approach  to  analyze  the  four-color  orderings  for  the  nine-point  stencil. 
Before  introducing  these  orderings,  we  briefly  review  the  analysis  for  the  five-point  NR  and  RB  order¬ 
ings  to  introduce  notation  and  motivate  our  nine-point  analysis. 

For  the  five-point  model  problem  (1.1)  with  the  NR  ordering,  the  SOR  iteration  takes  the  form 

«A+I  =  (1-0))  u%  +  f  (utft+uftU • 


(3.1) 
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The  stencil  for  updating  a  grid  point  on  iteration  n+1  using  the  NR  ordering  is  given  in  Figure  3.1 


n 


n+1 


-  n 


Figure  3.1.  NR  Stencil  in  Variable  n 


To  assist  in  determining  the  change  of  variables,  the  earliest  times  at  which  an  unknown  can  be 
updated  on  the  first  two  iterations  using  the  stencil  in  Figure  3.1  are  listed  below  each  node  in  Figure 
3.2. 


4,6 

5,7 

6,8 

7,9 

3,5 

4,6 

5,7 

6,8 

2,4 

3,5 

4’, 6 

5,7 

1,3 

2,4 

3,5 

4,6 

Figure  3.2.  Times  for  2  Iterations  of  NR 


These  times  define  the  iteration  variable  v.  Each  node  in  Figure  3.2  is  updated  at  time  v+1  by  the  sten¬ 
cil  shown  in  Figure  3.3, 

v 

I, 

V  -  v— 1  -  V 

I 

V 

Figure  3.3.  NR  Stencil  in  Variable  v 
and  the  corresponding  SOR  iteration  is 

ujk+l  -  (!—<*>)  «;*“l  +  —•  (‘tj'-i'i+Uj'jt-i  +M/Vt  +M/+u)  •  (3.2) 
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Figure  3.2  shows  that  the  times  along  lines  j+k  are  constant  and  that  iterations  in  variable  n  occur 
every  2  time  units.  Hence,  the  proper  change  of  variables  between  (3.1)  and  (3.2)  is 

v  =  2n  +  j  +  k  -  2.  (3.3) 

The  advantage  of  this  change  of  variables  is  that  the  eigenmodes  of  (3.2)  are  easy  to  determine  - 
they  are  simply  Fourier  modes  of  the  form 

ujk  =  g  'sin tyj  sinqy*.  (3.4) 

These  grid  functions  satisfy  the  boundary  conditions  provided  that  %  and  q  are  integer  multiples  of  7t, 
and  substituting  into  (3.2)  gives  the  following  equation  for  g : 

g2  =  (l-co)  +  -^-(cos2;/i+cosqA)-  (3.5) 

For  each  t,  and  T|,  this  equation  has  two  solutions,  giving  two  eigenmodes.  We  obtain  all  modes  by  let¬ 
ting  ^  and  q  range  over  the  values  q  =  n,  2k,  ....  (Af-l)rr,  for  a  total  of  2(7/ -l)2  modes.  This  is 
correct  since  (3.2)  requires  two  previous  levels  of  data  to  calculate  ujk+\ 

By  the  change  of  variables  (3.3),  an  eigenvalue  X  of  (3. 1)  is  seen  to  be  g  2  and  the  corresponding 
eigenvector  has  components  g'+tsini^  sinqy*.  We  now  appear  to  have  twice  as  many  eigenmodes 
as  required  for  (3.1),  but  here  each  mode  is  repeated  since  replacing  (^,q)  by  (Nn-^,Nn- q)  simply 
negates  the  roots  of  (3.5).  Since  X  =  g 2  this  reflection  leaves  these  eigenvalues  unchanged.  The  eigen¬ 
vector  is  also  unchanged  since 

(-g  y+tsin(N  k-Qxj  sin(//jt-q)yi  =  g/+*si,i£xj  sinqy*. 

Equation  (3.5)  gives  the  famous  relationship  between  an  eigenvalue  X  of  SOR  and  an  eigenvalue 
H=y  (cos^/i+cosq/i)  of  Jacobi: 


X+oM^xi^ 


(3.6) 


Equation  (3.6)  can  be  used  to  determine  the  optimal  value  of  to  as  a  function  of  p.  (see  e.g. 
Young{1971))  and  will  not  be  repeated  here. 
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A  similar  analysis  can  be  done  for  the  Red/Black  ordering  of  the  grid  points.  In  the  variable  n 
there  are  two  stencils,  one  for  the  Red  nodes  and  one  for  the  Black  ones,  as  given  in  Figure  3.4. 


n  n+1 

n - n - n  n+1 _ ) _ n+I 

n  n+1 

Red  Black 

Figure  3.4.  Red/Black  Stencil  in  Variable  n 

The  corresponding  iteration  is 

Hyr*  =  (l-“)  +  J  («  ",t+t  +«/-u+“Au) 

b  :  «, i+1 = (i-oo)  Ujk  +  f  +«;;!,  +u;_v<u)  • 

The  earliest  times  corresponding  to  Figure  3.4  are  given  in  Figure  3.5. 
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Figure  3.5.  Times  for  2  Iterations  of  RB 


(3.7) 


In  the  data  flow  variable,  v,  both  R  and  B  nodes  have  the  same  stencil;  namely,  the  stencil  of  Figure 
3.3.  Equation  (3.2)  gives  the  update  formula  for  all  nodes.  Hence,  (3.5)  and  (3.6)  also  hold  for  the 
R/B  ordering.  Figure  3.5  shows  that  the  times  along  all  lines  with  j+k  even  are  equal  and  similarly  for 
j  +k  odd  and  iterations  in  variable  n  occur  every  2  time  units  in  v.  Therefore,  the  change  of  variables 
is 
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v  =  2 n  +  (J+k)  mod  2-1 


(3.8) 


and  the  eigenvector  components  of  this  iteration  corresponding  to  4,t|  are 

fsin^xysinrt^i 
A.1/2sin4xysimqyt 

for  the  R  and  B  nodes  respectively.  This  agrees  with  that  given  in  Young[1950]. 

We  now  turn  to  the  nine-point  stencil,  with  orderings  based  on  four  colors.  The  results  of 
Adams  and  Jordan[1986]  applied  to  this  model  problem  show  that  the  72  distinct  four  color  orderings 
can  be  grouped  into  three  equivalence  classes  with  regard  to  convergence  behavior: 


Ordering  #1:  The  grid  is  colored  as  in  Figure  3.6a  with  ordering  R/B/G/O. 

Ordering  #2:  The  grid  is  colored  as  in  Figure  3.6b  with  ordering  R/B/G/O. 

Ordering  #3:  The  grid  is  colored  as  in  Figure  3.6b  with  ordering  R/B/O/G. 
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Figure  3 

.6a. 

Figure ! 

3.6b. 

We  will  show  that  Orderings  #1  and  #2  are  in  fact  equivalent,  both  being  equivalent  to  the  NR 
ordering  discussed  in  Section  2.  Ordering  #3  is  different  however,  and  gives  slightly  slower  conver¬ 
gence  based  on  the  spectral  radius  and  slightly  faster  convergence  based  on  the  eigenvalue  that  dom¬ 
inates  the  iteration  in  practice. 

Ordering  #1:  Figure  3.7  shows  the  update  times  for  this  ordering  which  define  the  data  flow 


variable  v. 
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Figure  3.7.  Times  for  two  iterations  with  Ordering  #1 
In  this  variable,  each  node  has  the  same  stencil  with  update  formula 


The  change  of  variables  is  given  by 

v  =  4n  +  c  -  3 

where  c  =0, 1, 2, 3  for  the  R,  B,  G,  and  O  nodes  of  Figure  3.7,  respectively. 

To  see  that  this  ordering  is  equivalent  to  the  NR  ordering,  we  only  need  to 
times  for  the  NR  ordering,  shown  in  Figure  3.8. 
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Figure  3.8.  Times  for  the  9  point  NR  Ordering 


(3.9) 


(3.10) 


look  at  the  update 


If  we  define  the  variable  v  by  these  times,  we  again  obtain  iteration  (3.9)  with  the  change  of  variables 
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v  =  4/i  +2k  +j  -  6.  (3.11) 

Since  a  change  of  variables  gives  (3.9)  in  either  case,  the  eigenvalues  and  hence  convergence  behavior 
will  be  identical.  If  g  is  an  eigenvalue  of  (3.9)  then  X.  =  g  4  is  an  eigenvalue  of  both  NR  and  Ordering 
#1. 


The  eigenvectors,  however,  will  not  be  the  same  since  a  different  change  of  variables  is  used  in 
each  case.  The  eigenvectors  for  the  NR  ordering  were  determined  in  Section  2.  Using  these  and  the 
above  changes  of  variables  allows  us  to  determine  the  eigenvectors  for  Ordering  #1.  In  analyzing  (3.9) 
we  view  it  as  applying  to  all  mesh  points  (jjc)  at  each  level  of  v,  although  in  our  applications  it  is 
applied  only  to  points  of  a  single  color  in  each  step.  (Note  that  we  could  apply  it  to  all  points  without 
affecting  the  results,  but  that  the  work  required  would  be  increased  by  a  factor  of  4.)  Since  (3.9) 
requires  four  levels  of  prior  data  to  determine  ujlu,  an  eigenvector  of  (3.9)  consists  of  4(N-1)2  values. 


V0' 

V1 

v2 

V3 

where  Vv  =  {V/J  for  j ,k  =  1,2,  •  •  •  ,jV-1.  If  g  is  the  corresponding  eigenvalue,  then 


V1' 

V0' 

V2 

V1 

V3 

=  g 

V2 

v 4 
■ 

V 3 

This  indicates  that  Vv+1  =  gVv  for  each  v  and  hence 


V  = 


'  v01 

gV° 

*V° 

g3v° 


If  we  now  let  V>°  be  the  vector  consisting  only  of  the  values  V)*  for  which  (JJc)  is  a  red  point,  and 
similarly  for  V$,  Vq  and  V§,  then  an  eigenvector  of  the  original  iteration  in  the  n  variable  (with  Ord¬ 
ering  #  l)  has  the  form 
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VK° 

gVa 

V<*»  =  a./0  e /?<"-'>’  (3-12) 

8  vc 
„3t/0 

with  eigenvalue  X  =  g4. 

On  the  other  hand,  by  the  change  of  variables  (3.1 1),  an  eigenvector  for  the  NR  ordering  in  the 
n  variable  has  the  form 

VjFR)  =  gn*j  V°k 

again  with  eigenvalue  X  =  g*.  Equation  (2.1 1)  gives  the  eigenvector  for  the  NR  ordering, 

Vyiw*)  =  a*[(<l>1/d>2)!/2K  sin^ty  sin-qy* 

where  <t>l  and  <>2  are  given  by  (2.7)  and  a  =  X1/2  =  g  2.  Using  this  we  obtain 

=  «~'[(<V<I>2)1/2]y  sin^ty  sirniy*. 

The  eigenvectors  arc  now  determined  by  (3.12)  with  T)=rt,  2rc, ....  As  before,  the  frequcn- 

A/-1 

cies  4.  '+l)ft. ... .  (N-1)k  give  repeats  of  the  eigenvalue-eigenvector  pairs  already  found. 

Ordering  #2:  For  Ordering  #2,  the  associated  earliest  times  for  the  first  two  iterations  are  given 


in  Figure  3.9. 
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Figure  3.9.  R/B/G/O  Times  for  Ordering  #2 

A  quick  inspection  of  Figure  3.9  shows  that  the  R  and  G  nodes  have  the  same  stencil  in  the  variable  v 
with  the  following  update  formula: 

R  ,G :  Ujl*4  =  (1-  to)  uji  +  +«/^+i ) 

+  *20  +u/-+id+i  +uy+i3jk-i  +u/+u+i  )■ 

Likewise,  the  B  and  O  nodes  have  the  same  stencil  with  update  formula: 

B  ,0 :  u,r*  =  (1-  to)  uf+f  («AV“mW£i  +«/& 

~2q  Of/-i  Jt -1  +u/-+U+l  +«*/? uk-l  +«/+U+l  )■ 

The  change  of  variables  from  n  to  v  is  again  given  by  (3.10),  where  c  =  0, 1, 2,  and  3  for  the  R,  B,  G, 
and  O  equations  respectively. 

Now,  the  equations  in  (3.13)  and  (3.14)  have  the  symmetry  needed  to  verify  that 
U^sin^CysimTy*.  Thus  the  methods  in  (3.13)  and  (3.14)  have  the  same  value  for  Vfk  but  different 
amplification  factors,  say  gx  and  g  2.  A  single  step  of  the  full  method,  in  variable  n,  consists  of  four 
sweeps,  two  with  (3.13)  and  two  with  (3.14)  and  hence  has  amplification  factor 

*.  =  ghl  (3-15) 

In  order  to  determine  X,  we  find  a =g  \gi  by  considering  two  sweeps  of  the  method,  Red  and 
Black,  say.  We  substitute  the  following  values, 


(3.13) 


(3-14) 
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UjVl=g2uji, 

U/'k+2  =  g\g2“,Vk> 

,,  v+3  _  n  _  2,,  v 
Ujk  ~  g  1  g  2  Ujk  < 

..  v+4  _  2  .  2..  v 

U;t  ~  g\g  2Ujk> 

v+S  _  -  2-  3,.  v 
~8\82ujk< 


(3.16) 


with 


u»  _  .<(£*/+ n>0 

ujk  c 


into  (3.13)  to  get 


g)2g  I  =  (!-“)  +  —■  (2g  2cos^/i  +2g  xg  2cosr|/i  +g  ,g  2  cos^/i  COST|/l ) .  (3.17) 

Next,  we  take  a  step  with  (3.14)  to  update  the  Black  nodes,  and  obtain  uji+5.  Plugging  (3.16)  into 
(3.14)  for  v+5  we  get 


8182  =(l-to)+  y  (2g  fg  jcos^h  +2g  ig  2C0st)/i  +g  icos^/i  cosr)h  ) .  (3.18) 

Equations  (3.17)  and  (3.18)  can  be  equated  and  g  t  and  g2  eliminated  to  give  a  quartic  equation  for  a. 
Surprisingly,  this  quartic  is  again  (2.10),  the  quartic  obtained  in  our  analysis  of  the  NR  ordering  by 
separation  of  variables.  This  shows  that  this  ordering  is  equivalent  to  the  NR  ordering,  and  hence  is 
also  equivalent  to  Ordering  #1. 

The  eigenvectors  in  variable  r.  can  be  seen  from  (3.10)  and  (3.16)  to  be 


V<*2>  = 


8  2Vg 

8i8zvg 

8182V0 


t  R  (.N-l)' 


(3.19) 


where  Vj°k=sin^xjsinr\yk,  q=rc,  2rr,  ...  ,  (A'-l)y  with  eigenvalue  X=g  xg2.  Again,  we  find  that 

N-\ 

eigenvalue-eigenvector  pairs  are  repeated  for  the  frequencies  (— — +l)rr, ....  (N -\)k. 
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Ordering  #3:  For  this  ordering,  the  earliest  times  for  the  first  two  iterations  are  given  in  Figure 


3.10. 
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Figure  3. 10.  R/B/O/G  Times  for  Coloring  #  1 


The  R  and  O  nodes  have  the  same  stencil  in  the  variable  v  with  the  following  update  formula: 


R , 0 :  u,r  =  (1-01)  Ujl  +  f  (ufth  +«/£,  +u/+\\+uJv*l\k) 

+  20  +UJ-U+l  +U/+Lk-1  +uj?l'jt+l ) 

Likewise,  the  B  and  G  nodes  are  updated  by  the  formula: 

b  ,g  :  ujt* = (l-  oi)  «/t  +  f  (Uj%  +«^+«A!*) 

+  20  +“y-+i  +M/>+u-t  +u/++U+i ) 

Now,  substituting  (3.16)  into  (3.20)  and  (3.21)  yields 


(3.20) 


(3.21) 


g\g2  »(l-tO)+  ~Y  ig  \g  2  COST|/l  +g  jCOS^/l )  +  ~  g  ig  2COST)/l  cos^/i  (3.22) 


and 


g  ig  2  =  (1-  (0)  +  -y-  (g  iCOST|fc  +g  fg  2cos^/i )  +  y  g  ig  2cos^/t  COST)*  .  (3.23) 


respectively.  As  before,  we  equate  (3.22)  and  (3.23),  eliminate  g j  and  g2  and  get  the  following  quar- 
lic  in  ot=g,g2: 
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2  4 

a4  -  (y  cocosr|h  cos^h  +  —  co2cosqhcosi;/i  )  a3 

+  (~-co2 cos^/icos2^  +  2(o-l) -  •£-  u2cos2t)/!  -  — -  co2cos2£/t)  a2  (3.24) 

J  Z  J  Zj 

2  4 

+  (y  o(l-  (a)cosr\hcos^h  -  —  o2cos^  cost|/i )  a  +  (o  -  1)2  =  0. 

The  change  of  variables  from  n  to  v  is  again  given  by  (3.10)  where  c=  0, 1,  2,  and  3  corresponding  to 
the  R,  B,  O,  and  G  equations,  respectively.  Following  the  same  arguments  as  before,  we  find  that  the 
eigenvectors  in  variable  n  for  Ordering  #3  are  given  by 


y(*3)  = 


VR° 

82Vb° 

gigM 

glg2VS 


■ R W-'t 


where  Vy°=  sin^sinriy*,  ^,Tl=7i,  2rc,  ....  (jV-l)y  and  eigenvalue  h=g  ,2g | •  Again,  we  find  that 

N-l 

eigenvalue-eigenvector  pairs  are  repeated  for  q=(— — +1)tc,  (IV-l)rr.  This  can  be  seen  for  the 

eigenvalues  from  (3.24)  since  replacing  (^,q)  by  (Nk-^,Nk-t\)  leaves  the  quartic  unchanged  and 
replacing  (&,r\)  by  either  or  (^.iVrr-Ti)  negates  the  coefficient  of  the  a  and  a3  terms. 

The  quartic  in  (3.24)  does  not  agree  with  the  quartic  in  (2.10)  and  the  roots  do  not  agree  in  gen¬ 
eral.  Consequently  the  optimal  to  and  corresponding  convergence  rate  are  different  for  this  ordering 
than  for  the  other  orderings  considered  so  far. 


Numerical  results  show  that  the  roots  of  (3.24)  have  the  same  qualitative  behavior  as  shown  in 
Figure  2.1  with  giving  the  slowest  decay.  The  optimal  in  again  appears  to  occur  where  the  two 
complex  roots  and  the  largest  real  root  intersect  When  h  =  0  and  to  =  2,  the  roots  satisfy 


4  36  3  22  2  36  ,  . 

a  -  — cr  +  — a  -  — a  +  1  =  0, 
25  25  25 


(3.25) 


and  are  given  by  (2.13)  where  9=cos~'(-7/25).  The  asymptotic  analysis  could  be  performed  as  before 
by  expanding  about  the  roots  of  (3.25),  but  this  has  not  been  carried  out  Instead  we  are  content  to  find 
by  numerically  solving  (3.24)  and  optimizing  the  result  This  leads  to 
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0^  =  2-  2.14*A  , 
popl  =  1  -  1.60*A 


(3.26) 


for  small  h.  This  R/B/O/G  SOR  iteration  was  programmed  and  the  results  of  (3.26)  confirmed.  By 
comparing  (2.21)  and  (3.26),  we  see  that  different  evaluation  orderings  for  the  same  coloring  of  the 
grid  points  can  lead  to  different  convergence  rates  based  on  the  spectral  radius. 

The  eigenvector  associated  with  the  spectral  radius  is  highly  oscillatory  as  the  mesh  is  refined 
(recall  this  was  true  for  the  rowwise  ordering  also).  An  analysis  similar  to  Garabedian’s  can  be  per¬ 
formed  to  find  the  convergence  behavior  for  smooth  initial  duia.  To  do  this,  we  expand  the  real  root  of 
(3.24)  at  cOgpt  about  cq=l  using  (2.14a)  and  (2.14b)  to  get  the  following  equation  which  determines  c  , 
as  a  function  of  ky. 


4c  i  —  5k  ic ,  +  6**  =  0  (3.27) 

Choosing  k ,  to  maximize  c  i  in  (3.27)  yields 


and 


*1  =  -|V6*=  1.95959* 


(3.28) 


2c  ,  =  ^6*  =  2.44949*. 

The  corresponding  values  of  p0*p,  and  0),^,  are 

p^=l-2.44949*/t, 
0)^  =  2-  1.9595 9*fi  . 


(3.29) 


(3.30) 


The  values  in  (3.30)  show  that  for  smooth  initial  data  this  ordering  is  preferred  over  the  rowwise  order¬ 
ing  and  Orderings  1  and  2.  Note  that  the  values  in  (3.26),  based  on  the  true  spectral  radius,  lead  to  the 
opposite  conclusion.  However,  the  results  of  (3.26)  are  valid  for  nonsmooth  initial  data.  Figure  3.11 
shows  the  decay  of  1 1  u  1 1 2  for  vaiious  initial  data.  For  initial  data  obtained  by  discretizing  the  smooth 
data  u  (x  ,y  )=(x 2-x  )(y  2-y ),  the  observed  decay  is  much  closer  to  that  predicted  by  (3.30),  than  to  p*. 

Another  ordering  of  the  coloring  in  Figure  3.6b  is  R/G/B/O.  This  leads  to  a  quartic  equivalent  to 
(2.10)  with  \  and  q  interchanged.  Hence,  for  a  square  grid  with  stepsize  h  in  both  the  x  and  y 
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directions,  and  pop<  for  this  ordering  are  also  given  by  (2.21),  Any  of  the  24  possible  orderings 
associated  with  Figure  3.6b  or  the  24  orderings  of  Figure  3.6a  can  be  easily  proved  (Adams  and  Jor- 
dan[1986])  to  have  the  same  eigenvalues  as  one  of  the  three  orderings  (R/B/G/O,  R/B/O/G,  R/G/B/O) 
we  have  analyzed  here.  In  addition,  there  is  a  third  coloring  of  the  grid  that  can  be  obtained  by  inter¬ 
changing  the  rows  and  columns  of  Figure  3.6a.  Any  of  the  possible  24  orderings  of  this  coloring  also 
can  be  proved  to  be  equivalent  to  one  of  the  three  we  have  analyzed.  Hence,  the  72  possible  four-color 
orderings  for  this  model  problem  can  be  grouped  into  two  different  equivalence  classes  that  character¬ 
ize  the  asymptotic  convergence  rate  behavior. 


i  terati  on 


Figure  3.11.  Convergence  history  (2-norm  of  error  versus  iteration  number)  for 
Ordering  /jf3  with  h  =  0.05  and  u  =  1.86.  Three  choices  of  initial  data  are  compared: 

(a)  Aji  eigenvector  corresponding  to  the  spectral  radius. 

(b)  An  eigenvector  corresponding  to  the  largest  nonspurious  eigenvalue. 

(c)  u,l  =  (x?  -  x/)(v*2  -  yk  )• 

Note  that  the  realistic  initial  data  (c)  more  closely  matches  (b)  than  (a). 


4.  A  nine-point  pseudo-SOR  method. 


We  now  consider  a  pseudo  SOR  method  with  the  stencil  in  Figure  4.1, 


and  iteration 


n 


n 


n-1 


Figure  4.1.  NR  Modified  Stencil  in  Variable  n 


My*+’  =  (1-W)  U%  +  •J"  («7.*il  +U/"-lJk+«/>l  Jk+“/".t+l  )  (4.1) 

+  ^  (“/"-l.Jt+1  +u/"~l  Jk-l  +uJ+lJt+l  +uJ+lJt-l ) 

which  differs  from  (2.1)  in  the  last  two  terms.  This  method  can  be  analyzed  using  the  techniques  of 
Section  3.  The  earliest  times  for  two  iterations  corresponding  to  Figure  4.1  are  equal  to  those  of  Figure 
3.2.  That  is,  the  iteration  expressed  in  terms  of  the  variable  v  is 

“ylT1  =  (f-w)  U?k-1  +  —  (w/jl-1  +“/-l.*+“/+l.t+“yVJk+t )  (4.2) 

+  "2Q  («/m!*+i  +“>Vu+i  +“y'+u-i ). 

with  the  change  of  variables  given  by  (3.3).  It  is  interesting  to  note  that  the  times  in  Figure  3.5  and  the 
iteration  in  (4.2)  are  also  obtained  for  a  Red/Black  ordering  of  the  grid  with  the  Red  and  Black  stencils 
shown  in  Figure  4.2. 
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Figure  4.2.  Red/Black  9-pt  Modified  Stencil  in  Variable  n 


Since  two  colors  do  not  decouple  a  grid  discretized  with  the  9-pt  stencil,  it  is  tempting  to  use  old  infor¬ 
mation  for  the  Black  to  Black  coupling  as  shown  in  Figure  4.2  to  obtain  a  method  suitable  for  parallel 
computers.  This  modification  was  considered  by  Kuo,  Levy,  and  Musicus[1986]  for  a  9-pt  stencil  aris¬ 
ing  from  a  discretization  of  a  PDE  with  a  cross-derivative  term.  They  show  that  the  convergence  rate 
of  SOR  for  their  problem  to  be  0  (h )  in  the  region  where  the  lowest  frequency  dominates.  We  show 
by  analyzing  (4.2)  that  the  use  of  only  two  colors  is  not  sufficient  for  the  9-pt  stencil  arising  from  the 
Laplacian.  In  particular,  we  will  show  that  the  method  converges  whenever  0<gx5/3,  that  the  optimal 
omega  occurs  where  the  lowest  and  highest  frequencies  cross,  and  that  the  rate  of  convergence  with 
the  optimal  omega  is  approximately  3 n2h2  for  small  h  as  opposed  to  \  .19nh  obtained  in  Section  3  for 
the  true  SOR  method  with  Ordering s  #1  or  #2. 

We  begin  by  observing  that  M,*=gvsin4x,simi)'*  is  an  eigenmcde  of  (4.2).  We  substitute  this 
into  (4.2)  to  get 

g  2  =  (I— (0)  +  -^y  (cos %h  +COST|/l )  +  y  COS^/l  COST)  A ,  (4.3) 

where  g 2  is  the  eigenvalue  of  the  method  in  (4.1)  or  the  Red/Black  method  depicted  in  Figure  4.2.  The 
eigenvectors  in  variable  n  for  the  NR  ordering  (4.1)  or  the  Red/Black  ordering  (Figure  4.2)  are  the 
same  as  the  respective  ones  given  in  Section  3  for  the  5-point  stencil. 

Equation  (4.3)  can  be  solved  for  g  to  get 

g  =  y-  ( cos  ZJi  +cost \h )  ±  Vyf  (cos  fj/i  +cost|/i  )2+(l-o))+y  cos^/i  cosq/i  .  (4.4) 

When  oo=l,  the  "pseudo  Gauss-Siedel"  method  has  amplification  factor 
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g±  = 


cos^h  +COST|/l  ^  +COST|/l  )2+-j(COS^/iCOST|/i  ) 


(4.5) 


which  is  maximized  when  ^=T)=7r.  The  maximum  value  is  easily  determined  from  (4.5)  be  cos2tc/i  . 
This  is  identical  to  the  spectral  radius  of  Gauss-Siedel  for  the  model  problem  with  the  5-pt  stencil.  The 
methods,  however,  differ  drastically  when  ro*l. 

To  determine  the  optimal  to,  we  must  minimize  the  maximum  modulus  of  g  in  (4.4).  Two  cases 
must  be  considered.  First,  assume  the  radical  in  (4.4)  is  negative.  Then 


lgl2  =  co  (1-—  cos^/t  COSTj/l)  -  1 


(4.6) 


and  for  convergence,  we  require  I  g  1 2  <  1  and  hence 


0<co<- 


1  -  y  COs£ftCOST|/t 


(4.7) 


As  h  — >0,  (4.7)  shows  that  the  method  is  divergent  for  co>5/3.  Also  (4.6)  shows  that  the  value  of  )g\2 
is  maximized  when  cos^/i=-cosqh .  This  occurs  when  T \=N-t,  (low  frequency  in  one  direction  and 
high  frequency  in  the  other)  and  this  maximum  is 


l«lmK  =  W(l  +  -jCOS27t/l)-l. 


(4.8) 


Ash-»0,  lytt-ll. 


Secondly,  assume  the  radical  in  (4.4)  is  positive.  Then,  the  maximum  value  of  g  occurs  when 
^=q=7t  and  is 


8  m,*  =  7y  <02COS2rt/l  +  (l-<0>  +  y  COS2*/! 


+  y  ©  cosn/i  “sj -y  a2cos2nh  +(l-co)  +  y  cos2ith  . 


(4.9) 


It  is  interesting  to  note  that  the  to  that  minimizes  (4.9)  occurs  when  the  radical  is  zero,  and  as  h  ->0, 
g>— »5/2.  That  is,  the  method  is  divergent  for  the  to  that  minimizes  of  (4.9)  for  the  lowest  fre¬ 
quency.  Recall  that  the  o>  that  is  optimal  for  SOR  for  consistently  ordered  matrices  corresponds  to  the 
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lowest  frequency.  For  the  pseudo  SOR  method,  the  optimal  0)  occurs  where  the  modulus  of  the  eigen¬ 
values  of  the  two  frequencies  i\=N-  %  and  T|=^=7t  are  equal.  This  to  is  determined  by  equating  (4.8) 
and  (4.9)  to  get 


16cj2cos27i/i 

25 


5 


COS27tA+G)-I 


-4(eo-l)2  =  0. 


(4.10) 


As  h  — >0,  co — >5/3,  so  we  look  for  a  solution  to  (4.10)  of  the  form 


to(A)  =  -f-  +  c  ih  +  c2h2  + 


(4.11) 


Substituting  (4.11)  into  (4.10)  and  equating  terms  yields 


ci  =  0, 


and  the  corresponding  values  of  the  optimal  (0  and  spectral  radius  of  (4.1)  are 


Popl  =  l-3^h2. 


(4.12) 


Comparing  (4.12)  to  cos27tA=l-7t2A2,  the  spectral  radius  of  the  pseudo  Gauss  Siedel  method,  shows 
that  as  h  -»0,  this  method  with  optimal  to  is  only  three  times  faster  than  with  w=l.  This  is  not  nearly  as 
good  as  the  true  SOR  methods  discussed  in  Section  3,  where  the  decay  factor  is  1-0 (A)  for  the 
optimal  to. 


5.  Comparison  of  point  and  line  methods. 
Let  the  system  Ax  =b  be  blocked  as 


'  Dx  -Un  .  .  -Uln 

“t 

V 

-L21  D2  .  •  -U  2m 

u2 

bi 

-L32  .  . 

■ 

= 

■ 

— 1  — f-m2  ■  • 

«m 

bn 

(5.1) 
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The  line  Jacobi  method  is  defined  as 


i<>  />«' 


(5.2) 


and  the  line-SOR  method  as 


D1u)"+1  =  co XLvu;+l  +  ®Zt/v«y"  +  +  (*“  “)  (5.3) 

;<i  >>• 

where  u;  corresponds  to  the  nodes  in  row  i  of  the  grid.  The  spectral  radius  of  the  Jacobi  method  for 
the  5-pt  and  9-pt  stencils  can  easily  be  found  by  separation  of  variables  since  sin^sinpy*  is  an  eigen¬ 
vector  of  iteration  (5.2).  These  results  are  given  in  Figure  5.1. 


Method 


Spectral  Radius 


5-pt  point 
5-pt  line 
9-pt  point 

9-pt  line 


COSTCh  =  1  -  y  7C 2h2 

<rpsnh_sl_^h2 

2-cosrc/i 

ycosrcji  +  y  cos 2nh  =  1  -  y  n2h  2 


yCOSTt/l  +  y  cos2nh 
1  -  y  cosnh 


~  1  -  n2h 


2u2 


Figure  5.1.  Spectral  Radius  of  Point  and  Line  Jacobi  Methods 


The  spectral  radius  of  the  line-SOR  method  for  the  5-pt  and  9-pt  stencils  can  now  be  found  using 
Young’s  theory  for  block-consistently  ordered  matrices.  That  is,  if  p  is  the  spectral  radius  of  line- 
Jacobi  then  and  for  line-SOR  are  given  by 

2 

a,opt~  1  +  V 1  _  JJ.2  (5.4) 

Pop.  =  Wop,  -  1 . 


These  SOR  results  are  summarized  in  Figure  5.2  below  where  Garabedian’s  results  for  the  9-pl  point 
methods  are  given  in  parenthesis. 
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Method 

Ordering 

Spectral  Radius 

5-pt  point 

Rowwise  or  Rcd/Black 

l-2nh 

5-pt  line 

Rowwise  or  Red/Black 

\-2<2 nh 

9-pt  point 

Ordering  #1 

1-1.7971/!  (1-2.35tiA) 

9-pt  point 

Ordering  #2 

1-1.7971/1  (l-2.357t/i) 

9-pt  point 

Ordering  #3 

1-1.60kH  (l-2.45rr/r) 

9-pt  point 

Rowwise 

1-1.79TC/!  (1-2.3577/!) 

9-pt  line 

Rowwise  or  Red/Black 

1-2^2kH 

Figure  5.2.  Spectral  Radius  for  Point  and  Line  SOR  Methods 


Figure  5.2  shows  that,  based  on  the  actual  spectral  radius,  the  line  methods  converge  faster  than  the 
point  methods  for  both  the  5  and  9  point  stencils.  The  9-pt  line  SOR  method  has  the  same  conver¬ 
gence  rate  as  the  5-pt  line  SOR  method  for  small  h;  whereas  the  5-pt  point  SOR  method  with  a  con¬ 
sistent  ordering  can  be  expected  to  converge  slightly  faster  than  any  of  the  9-pt  point  SOR  methods 
that  we  analyzed.  However,  the  convergence  rate  that  will  be  observed  in  practice  with  smooth  initial 
data  for  the  9-pt  point  methods  will  be  closer  to  Garabedian’s  results.  That  is,  we  can  still  expect  the 
9-pt  line  methods  to  converge  slightly  faster  than  the  point  methods,  but  now,  the  9-pt  point  methods 
will  converge  faster  than  the  5-pt  point  method.  This  later  fact  is  encouraging  since  the  9-point 
discretization  is  more  accurate  than  the  5-point  one. 

6.  Conclusions. 

The  SOR  method  with  several  orderings  and  a  pseudo  SOR  method  have  been  analyzed  for  the 
9-point  Laplacian.  The  Fourier  analysis  techniques  proposed  by  LeVeque  and  Trefethen[1986]  and 
separation  of  variables  techniques  were  used  to  determine  the  eigenvalues  and  the  eigenvectors  of 
these  methods. 
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We  examined  the  SOR  method  for  the  9-pt  Laplacian  using  the  natural  rowwise  ordering  and 
several  multicolor  orderings.  For  all  these  orderings,  we  gave  a  quartic  equation  for  the  square  root  of 
the  eigenvalues  as  a  function  of  the  frequencies  and  (0.  The  optimal  to  was  found  by  solving  this  quar¬ 
tic  numerically  for  all  our  orderings.  This  to  was  confirmed  for  some  orderings  by  asymptotically  solv¬ 
ing  the  quartic  corresponding  to  the  lowest  frequency  for  small  h.  The  numerical  results  indicate  that 
the  lowest  frequency  determines  the  convergence  rate,  but  so  far  we  have  not  proved  this. 

Our  results  were  confirmed  by  performing  the  SOR  iteration  with  an  initial  guess  corresponding 
to  the  eigenvector  associated  with  the  spectral  radius.  The  observed  rate  of  convergence  matched  that 
predicted  by  the  theory  to  five  decimal  places.  The  SOR  iteration  was  also  performed  by  using  a 
smooth  initial  guess,  obtained  by  discretizing  ( x-x2)(y-y 2)  for  various  stepsizes,  h.  In  these  cases, 
the  observed  convergence  rate  more  closely  resembled  that  predicted  by  Garabedian. 

The  results  also  show  that  different  orderings  of  the  same  coloring  can  lead  to  different  spectral 
radii:  R/B/G/O  and  R/B/O/G  for  the  coloring  in  6b  have  spectral  radii  of  l-1.79itA  and  l-1.60rt/t 
respectively.  For  smooth  initial  data,  these  two  orderings  also  led  to  different  effective  spectral  radii 
observed  in  practice;  namely,  l-2.35rth  and  l-2.45rth ,  respectively.  This  information  can  be  useful 
in  selecting  a  coloring,  an  ordering,  and  appropriate  initial  data  to  use  with  multi-color  SOR  on  parallel 
computers  (Adams  and  Ortega[1982]). 

An  analysis  of  the  pseudo  SOR  method  showed  that  the  optimal  co  occurs  when  the  high  and  low 
frequencies  cross  and  that  the  corresponding  spectral  radius  is  only  l-3n2h2.  This  is  inferior  to  both 
the  5-pt  and  9-pt  SOR  methods  we  analyzed.  In  addition,  for  small  h ,  the  pseudo  method  only  con¬ 
verges  for  Ckax5/3. 

The  5-pt  and  9-pt  point  and  line  SOR  methods  were  compared  for  the  model  problem  for  small 
h .  The  line  methods  converge  slightly  faster  than  the  point  methods.  The  9-pt  line  and  5-pt  line 
methods  have  the  same  asymptotic  rate  of  convergence,  but  the  5-pt  point  method  with  a  consistent 
ordering  was  1.12  times  faster,  based  on  the  spectral  radius,  and  1.23  times  slower,  based  on 
Garabedian' s  arguments,  than  the  best  9-pt  point  method  that  we  analyzed.  Hence,  for  a  smooth  initial 


37 


guess,  the  9-pt  point  methods  can  be  expected  to  be  more  accurate  and  converge  faster  than  the  5-pt 
point  method. 
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